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SOBOLEV TESTS OF GOODNESS OF FIT OF DISTRIBUTIONS 
ON COMPACT RIEMANNIAN MANIFOLDS 

By P. E. Jupp 

University of St. Andrews 

Classes of coordinate-invariant omnibus goodness-of-fit tests on 
compact Riemannian manifolds are proposed. The tests are based 
on Gine's Sobolev tests of uniformity. A condition for consistency is 
given. The tests are illustrated by an example on the rotation group 
50(3). 

1. Introduction. Although many tests of goodness of fit are available for 
distributions on the circle, comparatively little work has been done on de- 
veloping general tests of goodness of fit on spheres and other sample spaces 
used in directional statistics. Goodness-of-fit tests for specific models include 
score tests for Fisher distributions within the Kent family [11], Bingham dis- 
tributions within the Fisher-Bingham family [11], and for von Mises-Fisher 
distributions within the Fisher-Bingham family [13], as well as omnibus tests 
for Fisher distributions [6] and for Watson distributions [2]. An overview is 
given in Section 12.3 of [14]. The only general work on goodness-of-fit tests 
for directional distributions appears to be that of Beran [1] and of Boulerice 
and Ducharme [3]. Beran introduced Wald-type tests for certain nested ex- 
ponential models on spheres, whereas Boulerice and Ducharme considered 
score tests of goodness of fit of distributions on spheres and projective spaces. 
Neither Beran's tests nor those of Boulerice and Ducharme are consistent 
against all alternatives. 

For continuous distributions on the real line or the circle, the probability 
integral transform can be used to derive a test of goodness of fit from each 
test of uniformity. However, if the sample space is a manifold of dimension 
greater than 1, then there is no unique coordinate-invariant analogue of the 
probability integral transform, so that it is not obvious how one can obtain 
tests of goodness of fit from tests of uniformity. The purpose of this paper 
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is to use the machinery of Gine's [7] Sobolev tests of uniformity to obtain 
coordinate-invariant omnibus tests of goodness of fit on arbitrary compact 
Riemannian manifolds. This is in the spirit of the adaptations of Sobolev 
tests of uniformity by Wellner [17] to get two-sample tests and by Jupp 
and Spurr [9, 10] to get tests of symmetry and tests of independence. For a 
large class of Sobolev tests of uniformity (those which are consistent against 
all alternatives), the corresponding tests of goodness of fit are consistent 
against all alternatives. Section 2 recalls Gine's Sobolev tests of uniformity. 
In Section 3 Sobolev tests of goodness of fit are introduced and their basic 
properties are given. A numerical example on the rotation group SO (3) is 
presented in Section 4. 

2. Sobolev tests of uniformity. Let M be a compact Riemannian man- 
ifold. The Riemannian metric determines the uniform probability measure 
/1011M. The intuitive idea of the Sobolev tests of uniformity is to map the 
manifold M into the Hilbert space L 2 (M,fi) of square- integrable functions 
on M by a function t : M — > L 2 (M, /x) such that, if x is uniformly distributed, 
then the mean of t(x) is 0. 

The standard way of constructing such mappings t is due to Gine [7] and 
is based on the eigenfunctions of the Laplacian operator on M. For k > 1, let 
E k denote the space of eigenfunctions corresponding to the feth eigenvalue, 
and put d(k) = dimE k . Then there is a well-defined map t k of M into E k 
given by 

d(k) 
t=l 

where {/j : 1 < % < d(k)} is any orthonormal basis of E^. If {a\,a2, • • • } is a 
sequence of real numbers such that 

oo 

(2.1) 5>ld(A;)<oc, 

k=l 

then 

oo 

(2.2) x\->t(x) = ^2a k t k (x) 

k=l 

defines a mapping t of M into L 2 (M,fi). The resulting Sobolev statistic 
evaluated on observations x±, . . . , x n on M is 
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where (•, •) denotes the inner product on L 2 (M, //) given by 

(f,g) = / f(x)g(x)dfi(x), 

JM 

the integration being with respect to the uniform probability measure fj, on 
M. The corresponding Sobolev test rejects uniformity for large values of T n . 
The main properties of T n are the following: 

(i) It is defined without recourse to a coordinate system. 

(ii) It is invariant under isometries of M. 

(hi) Its large-sample asymptotic distribution under uniformity is that of 
a weighted sum of independent x 2 distributions. 

(iv) The corresponding test is consistent against all alternatives if and 
only if cifc 7^ for all k. 

Further details can be found in [7]. A brief outline of Sobolev tests on spheres 
is given in Section 10.8 of [14]. Many well-known tests of uniformity are 
Sobolev tests. 



3. Sobolev tests of goodness of fit. 



3.1. Weighted Sobolev statistics. Let T = {/(•', 0) '0 G 0} be a family 
of probability density functions on M, where the parameter space is a 
p-dimensional manifold. The null hypothesis to be tested is that the proba- 
bility density function of the distribution generating the data is in T . Let 9 
denote the estimate of 6 obtained from independent observations x±, . . -,x n 
by means of an estimating function ijj : M x — > ]R P , that is, is the root 
(assumed unique) of 

n 

5>Or 4 ;0)=O. 



The intuitive idea behind the Sobolev goodness-of-ht tests to be introduced 
here is that under the null hypothesis 6 is close to 0, so that the expectation 

1 



■t(x) 



.f(x;0) 

is near 0, and so therefore is its sample analogue 



1 



n 



E 



i 



■t(Xi 



The closeness of the latter to can be measured by the weighted Sobolev 
statistic 
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Thus, T w is obtained by applying a Sobolev test of uniformity not to the 
empirical distribution but to the weighted empirical distribution in which 
each observation Xi is weighted by the reciprocal of the value f(xf, 9) of the 
fitted density at that point. The null hypothesis is rejected for large values 
of T w . Significance can be assessed using Monte Carlo simulation from the 
fitted distribution. 

The weighted Sobolev statistic T w can also be written as 



T 



1 



v 



EE 



i 



i=lj ^f(x i ;e)f(x j ;G) 
which is often suitable for computation. 



(t(Xi),t(Xj)), 



Remark 1. Any direct sum decomposition L 2 (M,n) = E\ © E2 with 
E\ and E2 orthogonal in L 2 (M,/j,) yields a decomposition t = ti +t2 with 
tj(M) C Ej for j = 1,2, and so 



T 



T w i + T, 



w2 : 



where 



E 



1 f(xf,0 



■tj {Xi) 



for j = 1,2. 



Note that T w i and T w 2 are not necessarily asymptotically independent under 
the null hypothesis. Any group G of isometries of M gives such a direct sum 
decomposition L 2 (M, fj) = Em/g © Eg with 

E M /G = if G ^ 2 (M, /i) : /(<?*) = /(x), rr E M, g G G}, 
^ G = // € L 2 (M, f i):J G f(gx)dX(g) = 

where A is the uniform probability measure on G. If the f(-;0) are invariant 
under G, in that 

f(gx;0) = f(x;0) for x G M,0 G9, 5 € G, 

then the component Tm/g °f obtained from E M / G measures the goodness 
of fit of the data to the corresponding distribution on the quotient space 
M/G, while the component Tq obtained from Eg measures the lack of 
symmetry under G. 



Remark 2. Beran's [1] goodness-of-fit tests on spheres can easily be 
generalized to general compact Riemannian manifolds as follows. Let E\ 
and E2 be orthogonal finite-dimensional subspaces of L 2 (M, fx) which are 
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invariant under isometries of M. Consider the exponential model with prob- 
ability density functions of the form 



(3.2) 



f(x;0 1 ,0 2 ) =exp{(0i,ti(z)) + (6> 2 ,t 2 (x)> - k(6» 1 ,6» 2 )}, 

x G M, 6j G Ej, 



where ty.M — > Ej for j = 1,2 and k(Oi,02) is the normalizing constant. 
Then Beran's test of goodness of fit of the model obtained by putting 62 = 
in (3.2) rejects this hypothesis for large values of (62, S22.i~ 1 #2)> where 62 
is a suitable estimate of 62 and S22.1 -1 is the (2,2)-part of the inverse of 
the sample variance matrix of (ti(x),t2(x)). There is no direct connection 
between Beran's tests and the Sobolev goodness-of-fit tests introduced here. 
The large-sample asymptotic distribution of (0 2 , S22.i~ 1 #2) is Xdi m E 2 an< ^' * n 
contrast to those Sobolev tests of goodness of fit characterized in Theorem 3 
below, Beran's tests are not consistent against all alternatives. 

Although Boulerice and Ducharme [3] presented their score tests of good- 
ness of fit only for distributions on spheres and projective spaces, the gen- 
eralization to distributions on general compact Riemannian manifolds is 
straightforward. Whereas T w is defined by (3.1), the statistics of Boulerice 
and Ducharme have the form 



T BD = h'{varg(h)}- 1 h, 



where 



1 n 



n 



1 



=t(xi)-Eo[y/f{x;0)t(x)]), 



Eo[-] denoting expectation under the uniform distribution, and only finitely 
many a,k are nonzero. Thus, whereas T w is based on a multiplicative trans- 
form of t{xi) which makes its mean of order 0(ro -1 / 2 ) under the null hypo- 
thesis, Tbd is based on a standardization of t(xj) which makes its mean zero 
and its variance matrix the identity under the null hypothesis. In contrast 
to those Sobolev tests of goodness of fit characterized in Theorem 3 below, 
the tests based on Tbd are not consistent against all alternatives. One way 
of obtaining such consistency, mentioned on page 159 of [3], is to replace 
Tbd by 

2 

-.t{ Xi ) - Eo[yJ /M)t(x)] V 

>\lf{xi\6) 



BD 



1 



n 



E 



1 



where in (2.2) 7^ for all k. Because of the need to calculate 
Eq[J f(x; 0)t(x)}, Tbd and Tg D are more complicated than T w . 
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3.2. Large-sample asymptotic properties. An appropriate setting for large- 
sample asymptotic results is that in which the mapping t given by (2.2) is 
allowed to depend on the sample size n. Thus, there is a sequence t^) , t( 2 ) , • • • 
of mappings from M into L 2 (M, n) of the form 



(3.3) t( n) (x) = J2 a n,ktk(x), 

k=l 

where the sequences {a ni i,a nj 2, • • • } of real numbers satisfy 

oo 

(3.4) J2( a n,k) 2 d(k)<oo. 

k=l 

The corresponding goodness-of-fit statistic is the weighted Sobolev statis- 
tic (3.1) with t replaced by t( n y If tm,t( 2 ),... converges to some limit t, 
then T w has a limiting distribution. This is made precise in Theorems 1 and 2 
below. 

Suppose that independent observations from some distribution 

v on M . Let 6 U be the value of 6 (assumed unique) such that 

E v [ij>(x;0)] = 0. 

Then, under standard regularity assumptions (e.g., multivariate versions of 
those in Sections 4.2.2 and 7.2.2 of [16]) the following distributional result 
holds. 

Theorem 1 (Asymptotic distribution). Let tm,t(2)) • • • and t be map- 
pings from M into L 2 (M,[i) given by (3.3) and (2.2), corresponding to se- 
quences which satisfy (3.4) and (2.1). If 



(3.5) 
then 



^2{a n ,k ~ a k ) 2 d(k)^0 
k=i 



as n — > oo, 



— y 



(t (n) (xi)-T)4i\r(o,E) 



as oo, 



where — > denotes convergence in distribution and 



with 
(3.6) 



var,. 



1 



f(x;6 u ) 



(t(s) 



E„ 



difj(x; 6) 



86 



e=e v 



tjj{x;9 v ) v 



E v 



v = E„ 



1 



-t(x) 

f(x-,e u ) 

1 dl(6;x) 
f{x;K) 96' 



tx) 



e=e„ 
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l(0;x) denoting the log likelihood of 9 based on a single observation x. 



Proof. Taylor expansion of Yh=i ^{ x i'-> about 9 U gives 



Then 
1 



k v {Q v ) = E v 



Xi — T 



1 n 


9 v )'-\ 


/■ vector and 








86 


e=e v . 



S7^) (t, " )( 

1 n 1 

= ^S7(^) (t( " ,M - T) 



77 



J 1 \f(x i -,e) f{xi-e v 



1 n 1 

^S/(^) (t( " ,te) " T) 



(t( n )(Xi) -t) 



e=e v 



(t(n)(xi) - T) - V 



1 n i 

V™ i=1 f{xi]O v ) 
1 f n 1 n . . 1 

v n [ i=1 f{xf,e u ) i=1 j 

+ Op(n" 1 / 2 ). 

Since t and ^ are continuous and M is compact, application of the Hilbert 
space version of the limit theorem for triangular arrays (for the univariate 
version, see, e.g., Section 1.9.3 of [16]) to 



1 



E 



i 



■(t^ n )(xi)-T)-tl)(xi;e u )(k u (e u ) )v 
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shows that, as n — > oo, 



1 1 

y -(t w (ji)-T)AMO,S), 



where 



S = var. 



— ig-Ct^-T)-^;^)^^)- 1 )^). 



The next two results are straightforward consequences of Theorem 1. 

Theorem 2 (Asymptotic null distribution). Under the null hypothesis, 
if (3.5) holds, then: 

(i) r = 0, where r is defined by (3.6). 

(ii) T/ie distribution ofT w tends as n — > oo to i/iat o/ ||Z|| 2 , where Z is a 
random element of L 2 (M,/j,) with Z ~ iV(0, So) 



S = var y 



-t(x)-V(x;^) 



dO 



e=e u 



-i>> / 



with 



V = En 



dl(0;x) 



dO' 



tx) 



e=e v 



Eq[-] denoting expectation under the uniform distribution. 

In general, even for quite simple models, the matrices S and So in The- 
orems 1 and 2 do not admit simple explicit expressions. The main use of 
Theorems 1 and 2 is the following consistency result. 



Theorem 3 (Consistency). If (3.5) holds, then the test which rejects 
the null hypothesis for large values ofT w is consistent against an alternative 
distribution v if and only if 



E„ 



1 



fix- 6, 



-t x) 



^0. 



In particular, the test is consistent against all alternatives if and only if 
a k 7^ for all k. 



4. The rotation group SO (3). 



SOBOLEV TESTS OF GOODNESS OF FIT 



9 



4.1. Sobolev tests on SO (3). Two important Sobolev tests of uniformity 
on the rotation group SO (3) are Downs' [4] generalization of the Rayleigh 
test and Prentice's [15] generalization of Gine's [7] G n test. See Section 13.2.2 
of [14]. For a sample Xi, . . . ,X n on 50(3), these tests reject uniformity for 
large values of the Rayleigh statistic 



where 



and the Gine statistic 



Tr = 3ntr(XX), 



1 

n r— f 

i=i 



^4ggG-Iw'3-x;x 3 -)]^), 

respectively. The corresponding goodness-of-fit tests reject the null hypo- 
thesis for large values of the weighted Rayleigh statistic 

T wR = 3ntr(X4X w ), 

where 

1 n 1 
X w = — / — Xj, 

and the weighted Gine statistic 

TwG = -yy J _ (l-^L [ tr (i 3 _ x^x ? )] 1/2 

respectively. For Tr and T w r, at = for k > 2; for Tq and T w q, all the 
dfc are nonzero ([15], pages 173-174). It follows from Theorem 3 that the 
goodness-of-fit test based on T w r is consistent only against alternatives v 
with E U [X] / 0, whereas the test based on T w q is consistent against all 
alternatives. 



4.2. A numerical example. The set of vectorcardiogram data described 
in [5] is a classic data set on SO (3). The portion of this data set given by the 
orientations of vectorcardiograms obtained using the Frank lead system from 
boys aged 2-10 gives 28 observations on SO(3). For these 28 observations, 
Tr = 209, so that comparison with the large-sample limiting xl distribution 
(which is appropriate for n > 18 by Table 1 of [8]) indicates very clearly that 
uniformity should be rejected. 

The eigenvalues of X are 0.957,0.888 and 0.883, suggesting that it is 
appropriate to fit a matrix Fisher distribution with canonical parameter 
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matrix of the form kXJ, where k > and U G S0(3), that is, the probability 
density function is 

/(X;U,/c) = M(i,2,4K) _1 e K exp{Ktr(U'X)}, 

where M(l/2,2, •) is a Kummer function. (See [4, 12], or Section 13.2.3 
of [14].) The maximum likelihood estimates of k and U are k = 5.63 and 

/0.583 0.629 0.514X 
U= 0.660 -0.736 0.151 . 
V 0.473 0.252 -0.844/ 

The p-values (based on 1000 simulations) of the goodness-of-fit tests are 
0.169 for the weighted Rayleigh test and 0.126 for the weighted Gine test, 
each indicating clearly that the fit is acceptable. 

Acknowledgments. I am grateful to Professor T. D. Downs for giving me 
access to the vectorcardiogram data and to a referee for the suggestion of 
allowing the mapping t to depend on the sample size. 
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